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query Image. Therefore, most traditional methods break down when images contain similar 
objects that are scaled differently or at different locations, or only certain regions of the 
Image match. In this pape ... 
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The integration of distributed, heterogeneous databases, such as those available on the 
World Wide Web, poses many problems. Merer we consider the problem of integrating data 
from sources that lack common object identifiers. A solution to this problem Is proposed for 
databases that contain informal, natural-language ''names" for objects; most Web-based 
databases satisfy this requirement, since they usually present their Information to the end- 
user through a veneer of text. We des ... 
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The paper presents a similarity-based retrieval framework for a software repository that 
aids the process of maintaining, understanding, and migrating legacy software systems 
[12]. Designing a software repository involves three issues: (1) information content; (2) 
information representation; and (3) strategies for accessing repository artifacts. Assuming 
the architecture presented in [12] we extend the retrieval system to support imprecise 
queries, Iterative browsing, and diverse users. Because o ... 
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The need to automatically extract and classify the contents of multimedia data archives 
such as images, video, and text documents has led to significant work on similarity based 
retrieval of data. To date, most work in this area has focused on the creation of index 
structures for similarity based retrieval. There Is very little work on developing formalisms 
for querying multimedia databases that support similarity based computations and 
optimizing such queries, even though it is well known ... 

Similarity-based algebra for multimedia database systems 
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January 2001 Proceedings of the 12th Australasian conference on Database 
technologies ADC '01 
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In database management systems, the need to integrate content-based image retrieval 
facilities has become one of the key Issues. In this paper, we first illustrate the importance 
of such facilities with example queries and give an overview of the works done in similarity- 
based data retrieval. Then, we propose an image repository model that supports similarity- 
based operations on feature vector representations of Images. Moreover, we introduce a 
new similarity-based algebra on image tables. Thus, ... 
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We study a set of linear transformations on the Fourier series representation of a sequence 
that can be used as the basis for similarity queries on time-series data. We show that our 
set of transformations is rich enough to formulate operations such as moving average and 
time warping. We present a query processing algorithm that uses the underlying R-tree 
index of a multidimensional data set to answer similarity queries efficiently. Our 
experiments show that the performance of this algorith ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
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the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Images are increasingly being embedded in HTML documents on the WWW. Such 
documents over the WWW essentially provides a rich source of image collection from which 
user can query. Interestingly, the semantics of these images are typically described by their 
surrounding text. Unfortunately, most WWW Image search engines fail to exploit these 
image semantics and give rise to poor recall and precision performance. In this paper, we 
propose a novel image representation model called Weigh ... 
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The problem of finding nearest neighbors to a query In a document collection Is a special 
case of associative retrieval, in which searches are performed using more than one key. A 
nearest neighbors associative retrieval algorithm, suitable for document retrieval using 
similarity matching, is described. The basic structure used is a binary tree, at each node a 
set of l<eys (concepts) is tested to select the most promising branch. Backtracking to 
initially rejected branches is allowed and ofte ... 
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The rapid growth of digital image data increases the need for efficient and effective image 
retrieval systems. Such systems should provide functionality that tailors to the user's need 
at the query time. In this paper, we propose a new Image retrieval technique that allows 
users to control the relevantness of the results. For each image, the color contents of its 
regions are captured and used to compute similarity. Various factors, assigned 
automatically or by the user, allow high recall an ... 
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Statistical clustering is critical in designing scalable image retriev al systems. In this paper, 
we present a scalable algorithm for indexing and retrieving images based on region 
segmentation. The method uses statistical clustering on region features and IRM 
(Integrated Region Matching), a measure developed to evaluate overall similarity between 
images that incorporates properties of all the regions in the Images by a region-matching 
scheme. Compared with retrieval based on individual ... 
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We describe a similarity calculation nnodel called IFSM (Inherited Feature Sinnilarity 
Measure) between objects (words/concepts) based on their common and distinctive 
features. We propose an implementation method for obtaining features based on abstracted 
triples extracted from a large text corpus utilizing taxonomlcal knowledge. This model 
represents an integration of traditional methods, I.e., relation based similarity measure and 
distribution based similarity measure. An experiment, using our n ... 
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Most general content-based image retrieval techniques use colour and texture as main 
retrieval indices. A recent technique uses colour pairs to model distinct object boundaries 
for retrieval. These techniques have been applied to overall Image contents without taking 
into account the characteristics of individual objects. While the techniques work well for the 
retrieval of images with similar overall contents (including backgrounds), their accuracies 
are limited because they are unable to t ... 
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A new algorithm for document clustering is. introduced. The base concept of the algorithm, 
the cover coefficient (CC) concept, provides a means of estimating the number of clusters 
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within a document database and related indexing and clustering analytically. The CC 
concept is used also to identify the cluster seeds and to form clusters with these seeds. It is 
shown that the complexity of the clustering process is very low. The retrieval experiments 
show that the information-retrieval effectiv ... 

Keywords: cluster validity, clustering-indexing relationships, cover coefficient, decoupling 
coefficient, document retrieval, retrieval effectiveness 
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A specific query establishes a rigid qualification and is concerned only with data that match 
it precisely. A vague query establishes a target qualification and is concerned also with data 
that are close to this target. Most conventional database systems cannot handle vague 
queries directly, forcing their users to retry specific queries repeatedly with minor 
modifications until they match data that are satisfactory. This article describes a system 
called VAGUE that can handle vague queries ... 
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The anomaly-detection problem can be formulated as one of learning to characterize the 
behaviors of an individual, system, or network in terms of temporal sequences of discrete 
data. We present an approach on the basis of instance-based learning (IBL) techniques. To 
cast the anomaly-detection task in an IBL framework, we employ an approach that 
transforms temporal sequences of discrete, unordered observations into a metric space via 
a similarity measure that encodes intra-attrlbute depende ... 

Keywords: anomaly detection, clustering, data reduction, empirical evaluation, instance 
based learning, machine learning, user profiling 
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Traditional approaches for content-based image querying typically compute a single 
signature for each image based on color histograms, texture, wavelet tranforms etc., and 
return as' the query result, images whose signatures are closest to the signature of the 
query image. Therefore, most traditional methods break down when images contain similar 
objects that are scaled differently or at different locations, or only certain regions of the 
image match. In this pape ... 
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The integration of distributed, heterogeneous databases, such as those available on the 
World Wide Web, poses many problems. Merer we consider the problem of integrating data 
from sources that lack common object identifiers. A solution to this problem is proposed for 
databases that contain informal, natural-language "names" for objects; most Web-based 
databases satisfy this requirement, since they usually present their information to the end- 
user through a veneer of text. We des ... 
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The paper presents a similarity-based retrieval framework for a software repository that 
aids the process of maintaining, understanding, and migrating legacy software systems 
[12]. Designing a software repository involves three issues: (1) Information content; (2) 
Information representation; and (3) strategies for accessing repository artifacts. Assuming 
the architecture presented in [12] vye extend the retrieval system to support imprecise 
queries, iterative browsing, and diverse users. Because o ... 
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The need to automatically extract and classify the contents of multimedia data archives 
such as images, video, and text documents has led to significant work on similarity based 
retrieval of data. To date, most work in this area has focused on the creation of index 
structures for similarity based retrieval. There is very little work on developing formalisms 
for querying multimedia databases that support similarity based computations and 
optimizing such queries, even though it Is well known ... 
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In database management systems, the need to integrate content-based image retrieval 
facilities has become one of the key Issues. In this paper, we first illustrate the Importance 
of such facilities with example queries and give an overview of the works done in similarity- 
based data retrieval. Then, we propose an image repository model that supports similarity- 
based operations on feature vector representations of images. Moreover, we introduce a 
new similarity-based algebra on image tables. Thus, ... 
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We study a set of linear transformations on the Fourier series representation of a sequence 
that can be used as the basis for similarity queries on time-series data. We show that our 
set of transformations is rich enough to formulate operations such as moving average and 
time warping. We present a query processing algorithm that uses the underlying R-tree 
index of a multidimensional data set to answer similarity queries efficiently. Our 
experiments show that the performance of this algorith ... 
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the application. The visualization tool we use Is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Images are increasingly being embedded in HTML documents on the WWW. Such 
documents over the WWW essentially provides a rich source of image collection from which 
user can query. Interestingly, the semantics of these images are typically described by their 
surrounding text. Unfortunately, most WWW Image search engines fail to exploit these 
image semantics and give rise to poor recall and precision performance. In this paper, we 
propose a novel image representation model called Weigh ... 
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The problem of finding nearest neighbors to a query In a document collection is a special 
case of associative retrieval, in which searches are performed using more than one key. A 
nearest neighbors associative retrieval algorithm, suitable for document retrieval using 
similarity matching, is described. The basic structure used is a binary tree, at each node a 
set of keys (concepts) is tested to select the most promising branch. Backtracking to 
initially rejected branches is allowed and ofte ... 
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The rapid growth of digital image data increases the need for efficient and effective image 
retrieval systems. Such systems should provide functionality that tailors to the user's need 
at the query time. In this paper, we propose a new image retrieval technique that allows 
users to control the relevantness of the results. For each image, the color contents of its 
regions are captured and used to compute similarity. Various factors, assigned 
automatically or by the user, allow high recall an ... 
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Statistical clustering is critical in designing scalable Image retriev al systems. In this paper, 
we present a scalable algorithm for indexing and retrieving images based on region 
segmentation. The method uses statistical clustering on region features and IRM 
(Integrated Region Matching), a measure developed to evaluate overall similarity between 
images that incorporates properties of all the regions in the images by a region-matching 
scheme. Compared with retrieval based on individual ... 
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We describe a similarity calculation model called IFSM (Inherited Feature Similarity 
Measure) between objects (words/ concepts) based on their common and distinctive 
features. We propose an implementation method for obtaining features based on abstracted 
triples extracted from a large text corpus utilizing taxonomlcal knowledge. This model 
represents an integration of traditional methods, i.e., relation based similarity measure and 
distribution based similarity measure. An experiment, using our n ... 
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Most general content-based image retrieval techniques use colour and texture as main 
retrieval indices. A recent technique uses colour pairs to model distinct object boundaries 
for retrieval. These techniques have been applied to overall image contents without taking 
Into account the characteristics of individual objects. While the techniques work well for the 
retrieval of images with similar overall contents (Including backgrounds), their accuracies 
are limited because they are unable to t ... 
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A new algorithm for document clustering is introduced. The base concept of the algorithm, 
the cover coefficient (CC) concept, provides a means of estimating the number of clusters 
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within a document database and related indexing and clustering analytically. The CC 
concept is used also to Identify the cluster seeds and to form clusters with these seeds. It is 
shown that the complexity of the clustering process Is very low. The retrieval experiments 
show that the information-retrieval effectiv ... 
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A specific query establishes a rigid qualification and is concerned only witli data that match 
it precisely. A vague query establishes a target qualification and is concerned also with data 
that are close to this target. Most conventional database systems cannot handle vague 
queries directly, forcing their users to retry specific queries repeatedly with minor 
modifications until they match data that are satisfactory. This article describes a system 
called VAGUE that can handle vague queries ... 
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The anomaly-detection problem can be formulated as one of learning to characterize the 
behaviors of an individual, system, or network in terms of temporal sequences of discrete 
data. We present an approach on the basis of instance-based learning (IBL) techniques. To 
cast the anomaly-detection task in an IBL framework, we employ an approach that 
transforms temporal sequences of discrete, unordered observations into a metric space via 
a similarity measure that encodes intra-attribute depende ... 
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